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INTRODUCTION 


Consider  a  situation  in  which  we  have  a  collection  of  random 

variables  X  X  ,  ...,X  with  joint  distribution  F(x ,  ...,x  ),  and 
1  6  n  In 

several  statistics  {functions  of  these  n  random  variables),  say 

S(n)  =  g  (X  ,. .  .  ,  X  ),  T(n)  =  g  (X  . X  ),  etc.  It  is  required  to 

11  n  cl  n 

estimate  the  distributions  Fg(s),  F^t),  ...  of  these  statistics  (or  some 
characteristics  of  the  distributions)  by  obtaining  m  samples 


x,  . ,  x  x  . ,  i  =  1  ,  .  .  .  ,  m,  from  F(x  , . .  .  ,  x  )  and  hence  m 

1,16,1  n,i  1  n 

values  for  each  of  the  statistics  S{n),  T(n),  ...  .  Two  examples  of  this 

X 

type  of  situaf"on  are  as  follows: 

i)  In  testing  for  independence  in  a  time  series  {X.},  many  test 

statistics  have  been  proposed.  These  are  functions  of  finite  sets  of  the 

{X.  },  namely  X  , .  .  .  ,  X  ,  and  the  hypothesis  is  that  F(x  , , . .  , x  ) 
i  1  n  In 

n 

=  J1 ,  Fv  (x  ).  Typical  statistics  are  the  sample  serial  correlation  co- 
l-l  X.  i 
l 

efficients  with  various  delays  (lags),  i.  e. 


P  ,(n)  = 


1  ■?  ,  -.2 

5  ill  'V11 


t  =1,2, 


n~l. 


where  x*=(x+...  +  x  )/n,  statistics  based  on  the  finite  Fourier  trans- 
1  n 

form  of  the  x  *s  which  test  essentially  that  the  spectrum  of  {X.}  is 
i  i 

flat,  and  several  non-parametric  test  statistics  such  as  those  based  on 
runs.  The  distributions  of  most  of  these  statistics  are  known  for  inde¬ 


pendent,  normally  distributed  X.'s,  but  not  when  the  assumption  of  a 


?. 


normal  distribution  is  removed.  In  testing  for  a  renewal  hypothesis 
in  series  of  events  (Cox  and  Lewis,  1966,  p.  164)  an  exponential  distri¬ 
bution  for  the  X. 's  may  be  reasonable.  The  null  distributions  of  the  test 
1 

statistics  are  then  unknown,  as  arc  the  rate  of  convergence  to  the  limit¬ 
ing  (n-— oo)  distributions  (some  of  which  are  known). 

In  examining  and  tabulating  the  finite  sample  distributions,  it  may 
be  required  to  estimate  the  distributions  of  several  of  these  test  statistics 
for  many  different  values  of  n. 

ii)  There  is  much  interest  in  analyzing  very  complex  queueing 
and  congestion  systems,  particularly  those  that  arise  in  computing  and 
communication  contexts.  Here  one  might  be  interested  in  estimating  by 
simulation  the  distributions  of  the  waiting  times  at  several  points  in  the 
system  at  several  different  times  during  its  evolution. 

In  looking  at  the  problem  of  estimating  these  distributions  from 
m  replications  of  the  statistics,  several  general  problems  arise  which 
need  to  be  considered. 

First,  it  is  neither  practical  nor  desirable  to  save  all  of  the  in¬ 
formation  generated  on  a  statistic  by  the  simulation  in  the  form  of  the 
empirical  distribution  function  or,  equivalently,  in  the  form  of  the  com¬ 
plete  set  of  m  order  statistics.  Some  compact  characterization  of  the 
distribution  is  required.  In  situations  such  as  that  of  the  first  example 
given  above,  out  of  which  this  present  study  in  fact  arose,  the  character¬ 
istics  of  the  distribution  function  chosen  were  the  first  four  moments 


3 


and  sixteen  quantiles  of  the  distribution.  (A  quantile  x  of  a  distribution 

CL 

F(x)  is  defined  as  the  solution  of  the  equation 

or  =  F(xft),  0.  0  <  a  <  1.  0  r 

and  we  shall  be  assuming  throughout  this  paper  that  x  is  unique). 

The  probability  levels  chosen  for  the  quantiles  were  a  =  0.  001 ,  0.  002,  0.  005, 
0.010,  0.0?0,  0.025,  0.050,  0.  100,  and  a  -  0.900,  0.950,  0.  975,  0.980, 

0.  990,  0.995,  0.998,  0.  999.  The  choice  of  some  of  these  ar's  is  based 
on  the  levels  customarily  used  in  testing  statistical  hypotheses;  the  more 
extreme  values  have  been  chosen  rather  arbitrarily  to  characterize  the 
extreme  tails  of  the  distributions.  In  many  queueing  situations  it  is  these 
extreme  values,  rather  than  moments,  which  are  of  interest. 

Another  possible  characteristic  is  the  probability  of  the  statistic 
being  less  than  a  given  value.  These  percentiles  are  clearly  important 
in  studies  of  the  power  of  test  statistics  against  non-null  hypotheses. 

Their  estimation  is  fairly  straightforward  and  will  not  be  considered  here. 

A  second  point  concerns  the  measurement  of  statistical  efficiency 
by  either  the  variance  or  the  mean  square  error  of  the  estimator.  The 
mean  square  error  is  the  variance  of  the  estimator  plus  its  bias  squared. 

In  large  scale  applications  of  simulations,  as  treated  in  this  study,  it  is 
important  to  obtain  internal  assessments  of  the  variability  of  the  estima¬ 
tion  procedures  by,  for  instance,  estimating  a  quantile  as  the  average  of 
r  estimates  from  samples  of  size  m  .  where  r  m  =  m.  The  sample 
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1  ^2 

standard  deviation  calculated  from  the  r  estimates,  divided  by  (r  )  '  , 

then  estimates  the  sampling  standard  deviation  of  the  quantile  estimator  (see 
Mosteller  and  Tukey,  1968,  for  more  details).  In  order  to  assess  the 
internal  variability  of  the  estimation  in  this  way  the  bias  of  the  estimator 

must  be  small  compared  to  the  standard  deviation  of'the  estimator. 

1 

Otherwise  we  obtain,  from  one  point  of  view,  a  significant  bias  component 
in  the  mean  square  error,  or  from  another  point  of  view,  an  estimate 
with  very  small  sampling  variance  of  the  wrong  quantity,  i.  e.  ,  true 
quantile  plus  bias. 

Bias  thus  becomes  a  very  important  factor  in  assessing  the 
quantile  estimates  discussed  in  this  paper. 

A  third  consideration  is  that  in  some  cases  one  can  find  particular 
properties  of  a  statistic  whose  distribution  is  to  be  estimated  that  allow 
one  to  obtain  estimates  that  are  more  ’’efficient"  than  those  obtained  by 
straight  synthetic  sampling.  By  "efficiency"  here  we  mean  statistical 
efficiency,  or  the  relative  variances  or  mean  square  errors  of  different 
estimating  procedures.  However  there  are  other  less  tangible  costs  in¬ 
volved  in  simulation.  One  is  the  time  involved  in  deriving  a  particular 
procedure  for  a  given  statistic,  another  the  time  involved  in  programming 
and  debugging  such  a  procedure  and  the  delay  in  obtaining  results.  Still 
another  cost  is  the  actual  computing  time  involved  though  this  is  seldom 
mentioned  in  the  statistical  literature.  (This  latter  point  will  become 


clearer  later  in  the  paper.)  These  less  tangible  costs  are  an  important 
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component  of  overall  computational  efficiency.  The  point  of  view  taken 
in  this  paper  is  that  the  rapidly  accelerating  availability  of  large  memory, 
high  internal  speed  computers  makes  it  usually  more  "efficient",  in  the 
general  computational  sense,  to  forego  using  special  techniques  for  each 
particular  statistic  and  to  use  very  simple  straightforward  simulation 
techniques.  Thus  a  criterion  for  the  quantile  estimation  techniques  dis¬ 
cussed  here  is  programming  and  computing  simplicity. 

We  do  not  mean  to  imply  by  this  that  all  sophisticated  statistical 
and  Monte  Carlo  techniques  are  not  useful  or  applicable.  Global  techniques 
such  as  jackknifed  estimates  (Quenouille,  1956)  or  the  use  of  variance 
reduction  techniques  such  as  control  variables  (Gaver,  1969)  can  be  used 
with  the  quantile  estimation  methods  discussed  in  this  paper.  The  jack¬ 
knife  technique  will  be  discussed  in  the  next  section  and  the  use  of  the 
quantile  estimation  techniques  in  the  context  of  sophisticated  Monte  Carlo 
will  be  discussed  elsewhere. 

Finally,  it  is  perhaps  worthwhile  to  give  some  idea  of  the  numbers 
envisioned  in  connection  with  the  estimation  procedures.  Clearly  no 
scheme  involving  only  raw  simulation  will  work  satisfactorily  in  estimat¬ 
ing  a  0, 001  quantile  (x^  qq^)  unless  the  number  m  of  replications 
involved  is  substantially  greater  than  1000.  Typically  in  the  COMPSTAT 
program  (Goodman  and  Lewis,  1972),  for  which  these  techniques  were 
developed,  replications  of  100,  000  or  more  are  common.  These  are  not 
excessive  on  the  IBM  360/91  on  which  the  runs  were  made.  In  addition, 
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the  large  core  memory  of  this  machine  enabled  us  to  run  sampling  experi¬ 
ments  on  up  to  31  statistics  simultaneously. 

The  following  discussion  uses  as  examples,  for  the  most  part, 
the  extreme  0,001  and  0.999  quantiles.  The  techniques  are  still 
useful,  though  not  as  much  so,  for  the  inner  quantiles.  Some  discussion 
of  the  simultaneous  estimation  of,  for  example,  all  sixteen  quantiles 
listed  above  is  given.  The  dependence  of  the  utility  and  efficiency  of  the 
various  quantile  estimation  schemes  on  the  particular  quantile  or  set  of 
quantiles  plus,  for  example,  variations  in  the  complexity  of  computation 

of  the  series  X  ,..,,X  and  the  statistics  S(n),  T(n),,..  indifferent 
In 

problems  make  it  difficult  to  be  dogmatic  about  the  relative  utility  of 
various  quantile  estimation  schemes.  In  addition,  most  of  the  results 
required  are  finite  sample  results.  These  are  difficult  to  obtain  analytic¬ 
ally  and  expensive,  as  yet,  to  obtain  computationally, 

II.  QUANTILE  ESTIMATION 
a)  Overall  Considerations 

Two  general  methods  of  quantile  estimation  are  considered,  one 
based  on  the  order  statistics  of  the  sample,  the  other  based  on  stochastic 
approximation  (Robbins-Monro)  t  ehniques  (see,  e.  g.  ,  Robbins  and 
Monro  [1951],  Hodges  and  Lehrru  i  [1956],  Cochran  and  Davis  [1965]), 

For  simplicity  we  drop  the  notation  which  indicates  de¬ 
pendence  on  n  and  write  S(n)  as  S  aud  write  its  distribution 
function  simply  as  F(s).  A  collection  of  m  independent  random 
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variables  with  distribution  F(s)  will  be  denoted  by  S  . S.,,,,  ,S 

1  1  m 

and  the  corresponding  order  statistics  by  S .  .  <  S .  .  <  . . .  <S...<...<S  . 

(1)  (2)  (i)  (m) 

The  order  statistic  estimator  of  the  a-quantile  s  ,  where 

a  =  F(s  ),  is 
a 


aa  S([om])’ 


(2.1) 


where  [a  m]  denotes  the  integral  part  of  a  m.  Thus  for  a  -  0.  999  and 


m  10,000,  80t  999  *  S(9990). 

The  stochastic  approximation  estimate  s  (m)  is  defined  to  be  t>'e 


m  th  value  in  the  sequence  defined  by 


s  (i)  (i-U-  7 

a  a  l 


1  -  sgn 


1  ..  . 

: - «J  (>  *  i . 


2, . . .  |  ni) ,  ( Z,  2 ) 


where  sgn  x  *  1  if  x  >  0  and  -1  if  x  <  0,  and  s^(  0)  is  an  arbitrary 

initial  value.  If  the  constant  C  is  chosen  to  be  l/f(s  ),  where  f(s  )  is 

a  a 

the  density  associated  with  F(s)  evaluated  at  the  quantile  s  ,  then  the 

a 

asymptotic  variance  (m-*oo)  of  a  (m)  is  minimized.  In  fact, 


K  (s  (m)  }  s 
a  a 


(2.3) 


and 


vai 

m  f  ( s  ) 
a 


(2.  4) 


Remarkably,  the  estimate  s  has  the  same  asymptotically  normal 

a 

distribution  as  does  s^(m).  Results  on  rates  of  convergence  are  known 
for  but  not  for  £^(m).  This  will  be  discussed  later,  but  significant 
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comparisons  can  be  made  on  the  basis  of  the  asymptotic  results  and  on 
known  computational  results. 

Order  Statistics,  s 

. .  -■ 1  "■*  -‘—t* 

-  computation  time  (to  older  the  m  realizations  of  S)  is  pro¬ 
portional  to  m  ln(m); 

-  computer  memory  required  for  the  sorting  process  is  proportional 
to  m.  (Actually,  to  am  if  a  <  \r  ,  (l-a)m  if  a  >  rr  ); 

m  L 

-  no  initial  values  or  knowledge  of  F(s)  is  required; 

-  the  rate  of  convergence  of  the  estimate  is  known. 

Stochastic  Approximation, 

-  computation  time  (binary  comparison)  is  proportional  to  m; 

-  computer  memory  required  is  proportional  to  2; 

-  initial  values  s^( 0)  and  values  for  f(s  )  are  needed,  presumably 
previous  estimates  or  guesses  based  on  prior  knowledge; 

-  no  reliable  results  known  on  the  rate  of  convergence  of  s  ,  or 

Ot 

even  of  E(s  ); 

ct 

-  it  is  not  necessary  to  know  S^  exactly,  only  that  it  is  greater 

than  or  less  than  s  (i).  This  can  be  very  advantageous  if  com- 

a 

putation.  of  S  is  time-consuming. 

In  summary,  has  very  definite  advantages  over  in  terms 

of  computational  considerations.  One  might  say  that  the  asymptotic 

relative  efficiency  of  s  compared  to  s  ,  in  terms  of  real  time  and  not 

a  ct 

sample  size  m,  is  zero.  However,  initial  values  are  needed  for  s  ,  so 
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that  the  asymptotic  results  really  beg  the  question.  Further  differences 
appear  in  terms  of  finite  sample  properties  of  the  estimators,  and  these 
are  discussed  next. 


III.  QUANTILE  ESTIMATION  -  Finite  Sample  Considerations  for  s 


It  is  well  known  (David,  1971  )  that  for  the  ordered  sample 

Sn  \  <  S/?\  <  ...  <  S...  <  ...  <  S 
(1)  (2)  (i)  (m) 


F  (s)  =  prob  {S  <  s  }  *=  2 
S(i)  (i)  k 


S.  ^)[F(s)]k  [1-F(s)]m‘k 


f  (s)  =  F»  (s) 
S(i)  S(i) 


i  [F(s)  ]i-1  [l  -F(  s)]111”1  f(s), 


-(:)> 

E<V  =E(S([0m])>  *  3.+  0(i)’ 

var  <7  }  «  var  <S  m  }  -  ♦  O  . 

L  J  m  f  ( s  )  \  m  / 


(3.1) 

(3.2) 
(3.  3) 
(3.  4) 


Asymptotic  expansions  for  E  {s^  }  are  known  (David  and  Johnson 
[1954]  and  Clark  and  Williams  [1958]),  the  first  terms  in  the  expansion 
heing 


f'(«  ) 


E  {s  }  =  - - - 

a  a  „  ,3 


a  g(l  -a) 
m+  2 


2[(f'(»))2-f(sjf  "<sj] 


bf\) 


2g(l -g) (l-2q) 

’  (m+2)  (m+3) 

(3.  5) 


where  derivatives  are  denoted  by  primes  and  powers  by  arable  numerical  exponents. 
No  precise  conditions  for  these  asymptotic  expansions  seem  to  be  known,  and 
they  must  be  used  with  care.  For  example,  if  S  is  uniformly  distributed  be¬ 
tween  0  and  l,s  •  a  and,  using  (3.  2),  E  {S  .  }  =  i/(m+l )  Ki/m)-i/[(m)(m+ 1)]. 

ot  ( i ) 
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Thi  s  if  amis  the  integer  i,  we  have 


E  { s  }  =  a  -  7— — — rr  , 

1  a  1  (m+  1) 

but  the  second  term  of  the  expansion  (3.  5)  is  zero.  Similarly,  in  the 
extreme  but  computationally  useful  case  S  =  ~Tj?y ,  where  U  is  uni¬ 
form,  we  have 


Fo-ih'  l3-6) 

a  distribution  with  an  infinite  mean.  Surprisingly,  however,  7  can  be 

a 

shown,  using  (3.  2)  to  be  an  unbiased  estimator  of  s^.  In  this  case  the 
expansion  (3.5  )  does  not  even  converge. 

An  important  consequence  of  (3.  3)  is  that  the  jackknife  technique 
(Quenouille  [1956],  Mosteller  and  Tukey  [1966])  for  eliminating  an  O(^) 
term  in  the  bias  can  be  applied  to  the  order  statistic  estimate  T  .  By 
way  of  illustration  consider  the  technique  being  applied  with  just  a  simple 
splitting  of  the  data.  Thus,  assume  m  is  even  and  7^(1)  is  the  order 
statistic  estimator  from  the  first  m/2  S^s,  s^(2)  the  estimator  from 
the  second  m/2  S^'s.  The  "typical  values"  are  defined  to  be 


s  (1)  *  2s  -7(1), 
a  a  a 


%(2)  - 


(3.7) 


and  the  jackknifed  estimate  is 


7(1)+ 7  (2)  7  (1)  s  (2) 

=  a  a  ■  a  a 

8  -  - - -  -2s - -  - 

a  Z  a 


2 


2 


2 


(3.8) 
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The  O{-^0  term  in  E[¥^]  is  zero.  There  is  no  appreciable 

computing  cost  involved  in  obtaining  the  jackknifed  estimate.  The  two 

halves  of  the  sample  can  be  sorted  in  place  to  obtain  T  (1)  and  s’  (2), 

a  a 

and  then  these  sorted  halves  are  merged  to  obtain  the  complete,  sorted 

sample  S  <  ...  <S  .  This  in  fact  is  just  the  usual  binary  sorting 

(1)  (m) 

procedure  which  results  in  min  (m)  sorting  time. 

The  main  drawback  to  the  jackknife  procedure  is  that  it  may  in¬ 
crease  the  variance  of  the  estimator,  and  this  variance  is  difficult  to 
obtain  analytically.  We  have 

var  {s'  }  =  5  var  }  -  2  cov  fs  s’  (1)  }  <  5  var  {¥  } 
a  a  a  a  o  . 

The  covariance  term  involves  the  covariance  between  the  [cr(m/2)]  order 

statistic  in  half  the  sample,  and  the  [o  m]  order  statistic  in  the  whole 


sample.  If  in  fact  s  (1)  and  s  (2)  are  approximately  uncorrelated,  then 

a  a 

there  is  no  increase  in  variance.  Even  if  the  variance  is  inflated  enough 

to  make  the  mean  square  error  of  the  estimates  approximately  equal, 

*  ^ 

there  is  a  gain  in  that  the  smaller  bias  of  the  jackknifed  estimator  allows 
for  sectioning  the  complete  sample  of  size  m  into  r  smaller  sections 
of  size  m^  ,  This  gives  a  more  precise  empirical  variance  estimate 
and  smaller  computation  time,  the  latter  following  since  shorter  sections 
are  sorted. 

Unfortunately,  there  is  some  evidence  (Miller,  1964)  that  jack¬ 
knifing  is  a  poor  technique  to  use  with  estimators  involving  extreme  order 


(3.9) 


¥ 


statistics.  We  give  now  an  illustrative  example. 

Example  -  The  exponential  distribution 

Consider  the  estimation  of  the  0,  999  quantile  s  for  a  unit 

vi  77  7 

exponential  distribution.  For  simplicity  we  assume  that  the  sample 
size  m  is  such  that  0.  999  m  is  an  integer  k.  We  have 


F(s)  -  1  -  e  , 


.-1 


s  «  F  (y)  -  -  1  n  ( 1-y)  , 


s  =  -  In  ( 1-a) , 
a 


By  direct  methods  (Cox  and  Lewis,  1966)  one  gets 


E{\}“  E{S(k)}.  -ln<l-a)  {fy* 


1 

(1-a)  -  1 

12m2 

(1-a) 

var  {s  }  =  — - — - 
a  m(  1-a) 


2m 


l_  r_4 .  n+ 0[-v] 

i  [U“°>  J  Vm3/ 


(3.1 


(3.1 


These  results  are  used  to  give  Table  1.  The  ratio  (column  8)  of 
the  standard  deviation  (  a,  column  7)  of  the  estimate  to  the  bias  of  the 
estimate  (column  3)  indicate^  roughly  how  feasible  it  is  to  use  averages  of 


estimates  s  from  samples  of  size  m  to  estimate  ■  more  pre- 

0.999  r  0.999 


cisely,  along  with  an  estimate  of  the  sampling  variance  of  that  estimate. 

Thus  36  samples  of  size  m  ■  10,000  produces  an  estimate  with  a  standard 
deviation  approximately  equal  to  the  absolute  value  of  the  bias,  clearly  an 
undesirable  situation.  Moreover  the  bias  is  -0.051,  so  that  this  estimate 
gives  us  accuracy  of  at  best  two  decimal  places.  This  would  not  be  acceptable 


in  many  cases 


100,000  -0,  166x10  6,000.  0  J,  00  0.0111  (0.000269) 
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The  leading  term  of  the  bias  of  the  jackknifed  estimate  (3.  8) 
is  shown  in  Table  2  (column  2)  as  a  function  of  m.  Column  3 
shows  the  ratio  of  the  standard  deviation  of  the  unjackknifed  esti¬ 
mator  to  this  bias.  Clearly,  sectioning  and  averaging  is  much  more 
feasible.  However,  as  indiated  in  column  4  an  increase  by  only  a 
factor  of  1.  03  in  the  variance  of  Tg  makes  the  mean  square 

error  of  this  estimate  as  large  as  that  of  s^  ^  at  m  «  10,000. 
Clearly,  jackknifing  is  of  greater  utility  in  the  range  of  m  from  about 
1, 000  to  5,000,  if  the  variance  does  not  blow  up  and  moreover,  it 
is  desirable  to  use  sections  as  short  as  this  if  possible,  to  reduce 
computation  time. 

Estimates  of  the  variance  of  T  obtained  by  synthetic 

sampling  are  given  in  column  5  of  Table  2.  The  quantities  in  paren¬ 
theses  are  estimates  (63  degrees  of  freedom)  of  the  standard  deviation 
of  these  variance  estimates.  There  is  enough  increase  in  the  vari- 
ance  over  the  unjackknifed  situation  to  characterize  the  gain 
from  using  the  jackknife  as  being  marginal  rather  than  categori¬ 
cal  with  this  type  of  quantile  estimate.  For  the  exponential  case, 
however,  detailed  calculations  on  computation  times,  bias,  etc.  , 


with  sections  of  length  approximately  m  *  5,  000. 
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An  alternative  scheme  is  to  use  as  a  "typical  value"  an 

order  statistic  estimate,  s*  ,  from  a  small  independent  sample 

of  size  m/I.  Then  the  estimate  (s*  -IT  )/(l-l)  has  a  mean 

a  a 

value  free  of  the  1/m  term  and  a  variance  decreasing  to 

var  {"S'  }  as  i  increases.  For  f  =  10  the  variance  is 
a 

1.  36  var  {s  };  for  I  *  20  it  is  1.  13  var  {T  },  an  in- 
a  a 

crease  smaller  than  that  found  for  TT^  .  However,  this 
technique  is  difficult  to  use  for  an  m  much  less  than  10,000 
in  estimating  a  0.  001  quantile. 


IV.  QUANTILE  ESTIMATION  -  Finite  Sample  Considerations  for  $ 

a 

Experience  shows  that  for  extreme  quantiles,  convergence 
of  s^(m)  *8  80  81°W  as  to  be  unacceptable.  Though  problems 
might  be  anticipated  in  the  tails  of  extremely  skew  distributions, 
e.  g.  ,  the  distribution  F(s)  *  s/(l+s)  discussed  above,  they 

occur  elsewhere  too,  as  can  be  seen  roughly  in  the  following 

<«  * 

example. 
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Example  -  The  unit  exponential  distribution 

Consider  again  the  unit  exponential  and  the  a  =  0.  001  quantile. 

This  is 

8o.  001  =  - in^  -  °‘001>  ~0-001- 


We  have 

f ( s  1  ■  -0.001 

f'  0.001 '  1  * 

Assume  the  initial  estimate  is  s^  qqj(0)  =  001»  and  that  is  less 

than  0.  001,  This  has  probability  0.  001,  Then 


So.ooi(1)!5-ll£  0-999  * 


Clearly  successive  S.'s  are  greater  than  the  estimates  until  the  esti¬ 
mates  get  back  to  zero.  After  f  steps  the  estimate  will  have  moved 

I  . 

0.001  x  1  x  1/i  in  the  positive  direction  and  since 


I  1  1 

.Z.  0.  5772+  in(f)+  —  , 

i  —  l  i 

«  408 

the  return  to  the  origin  takes  about  f  =  10  steps.  By  following  through 
this  example  it  can  be  seen  that  the  estimate  is  almost  sure  to  become 
negative  and  take  an  enormously  long  time  to  return  to  zero. 

The  jackknife  technique  is  not  suitable  as  a  means  of  overcoming 
this  difficulty,  partly  because  the  order  of  the  leading  term  in  the 


asymptotic  expansion  for  the  bias  is  not  known,  but  also  because  if  the 
correct  jackknifing  technique  was  applied  (possibly,  for  a  m  term)  the 
example  makes  it  clear  that  the  estimator  would  still  be  unsuitable. 

Two  other  techniques  suggest  themselves,  Kesten  [1958]  seems 
to  be  one  of  the  few  authors  to  have  noted  the  problem  of  long  runs  occur¬ 
ring  in  the  stochastic  approximation  scheme.  He  suggested  higher-order 
memory  schemes,  for  example,  not  increasing  the  divisor  of  C  in  (2.2) 
by  one  if  all  p  previous  differences  of  sample  values  and  estimators 
were  of  the  same  sign.  Although  this  would  clearly  help  in  the  example, 
it  violates  the  need  for  simplicity.  In  fact,  the  divisors  in  the  estimators 
of  different  quantiles  for  different  statistics  become  different,  and  this 
creates  very  complex  and  time  consuming  programming  procedures. 

Another  technique  is  to  bound  the  estimator  s^(i).  Thus  requir¬ 


ing  s^(i)  —  0  in  the  example  would  have  obviated  the  problem,  but  uses 

specific  information  about  the  statistics.  Empirically  derived  bounds 

can  be  obtained  at  the  same  time  that  initial  values  s  (0)  and  f(s  ) 

a  a 

are  obtained.  For  example  a  small  pilot  run  using  order  statistics 
will  give  these  initial  estimates  and  bounds  for  outer  quantiles 
(.  001  and  .999)  from  the  known  properties  of  the  order  statistics. 
Inner  quantiles  are  bounded  by  outer  quantiles.  Thus  it  has  been  found 
empirically  that  if  A=  j  bound  -  s"^(0)  |, then  when  C/A>  100,  the 
stochastic  convergence  scheme  works  reasonably  well.  It  is,  however, 
ponderous  when  compared  to  the  quantile  estimation  scheme  discussed 


in  the  following  section, 
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V,  THE  MAXIMUM  (OR  MINIMUM)  TRANSFORMATION 

It  is  natural  to  look  for  a  computationally  simple  transformation 
of  the  data  to  overcome  some  of  the  problems  of  estimating  extreme 
quantiles,  and  the  following  transformation  appears  to  solve  most  of  the 
computational  and  statistical  problems. 

Assume,  e.  g.  ,  that  a  >  1/2,  and  consider  the  first  v  S's,  i.  e.  , 

Sj  , .  ,  .  ,  Sv.  Then  the  distribution  *-<•)  of  the  maximum  of  the  v  S's  is 

T(s)  =  prob  {(  maxy  S.)  <s}  =  {F(s)}V.  (5.1) 

Note  that 

F(sff)  =  FV(sa)  =  aV=  a'  (5.2) 

and 


s 

a 


(5.3) 


so  that  the  o-quantile  of  F(s)  is  the  same  as  the  o'  =  aV  quantiles  of 

F(s).  If  we  assume  that  v  divides  m,  and  m'  =  m/v,  taking  maxima 

* 

of  successive  groups  of  v  S's  gives  a  reduced  sample  of  size  m',  i.  e,  , 


S'  ,  . 
m 


Thus  we  have  a  reduced  sample  and  have  transformed  the  problem  from 
estimating  an  extreme  quantile  (for  level  a)  to  estimating  a  more  reasonable 

V 

quantile  (for  level  a  ).  For  example,  one  might  take  v  large  enough  to  re¬ 
duce  the  problem  to  estimating  a  median.  Since  qV  =  1/2  in  this  case, 
v=  (In  l/2)/(ln  a ).  The  consequent  values  of  v  are  shown  for  eight  or's 


in  Table  3. 
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Table  3 

v 

Size  of  v  for  a  =1/2 

a  0.900  0.950  0.97  5  0.  980  0.990  0.995  0.998  0.999 

v  6.  6  13.  5  27.  4  34.  3  69.  0  138.  3  346.  2  692.8 

Some  general  points  about  the  use  of  the  transiort.ici.tion  in  quantile 
estimation  follow. 

(1)  The  transformation  of  the  original  sample  {S.  }  into  the  reduced 
sample  {Sj  }  is  very  efficient  computationally  since  only  v 
binary  comparisons  and  one  memory  cell  are  needed  to  obtain 

Sj  from  , . .  .  ,  S^  . 

(2)  The  transformation  uses  no  special  information  about  the 
properties  of  S,  e,  g.  ,  S  >  0. 

(3)  Although  transforming  to  the  median  is  not  necessarily  optimal, 
it  is  known  that  stochastic  approximation  estimates  of  the 

median  work  well  (Ccchran  and  Davisf  1965). 

«  ' 

(4)  If  a  <  1/2  the  minimum  S"  of  the  v  S's  is  used  instead. 

VI.  THE  MAXIMUM  TRANSFORMATION:  ASYMPTOTIC  VARIANCE 

Using  (2.  4)  we  now  compare  the  asymptotic  variance  of  a  quantile 
estimate  (order  statistic  or  stochastic  approximation)  s^j  from  the  re¬ 
duced  sample  with  that  of  the  quantile  estimate  s  from  the  original  sample. 


We  have,  from  (5.  1), 

I(s)  «  v  f(s)  (F(s)  }V_1 
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Thus  the  statistical  efficiency  of  the  estimate  is  decreased  by 
approximately  1/(1,443),  but  the  speed  of  the  maximum  transformation,  com¬ 
pared  to  the  computations  involved  in,  for  example,  the  stochastic  approximation 

estimate  would  probably  make  their  computational  efficiency  about  equal. 

By  computational  efficiency  we  mean  the  relative  computing  times  required 
to  achieve  the  same  variance.  Both  estimates  (reduced  and  non-reduced 
samples)  are  asymptotically  unbiased. 

The  variation  of  g(o;v)  with  v  for  a  =  0,999  is  shown  in  Table  5. 

Table  5 

Variation  of  g(a;v)  with  v  at  a  =  0.999 


V 

10 

50 

100 

200 

500 

700 

1000 

a' 

0.  990 

0.951 

0.  905 

0.819 

0.  606 

0.  496 

0.368 

g( 0.  999;v) 

1.005 

1.  025 

1.  051 

1.  107 

1.  297 

1.  448 

1.718 

The  choice  of  v  in  particular  situations  depends  on  computational 
considerations,  particularly  the  meshing  of  estimates  for  various  a's 
for  a  given  statistic,  and  although  no  global  results  can  be  given,  we  discuss 
these  considerations  in  the  next  two  sections, 

VII.  THE  MAXIMUM  TRANSFORMATION:  COMPUTATIONAL  CONSIDERATIONS 
(i  )  Order  statistics 

The  use  of  the  maximum  transformation  with  order  statistic 
quantile  estimates  gives  very  little  gain.  Bias  is  not  an  extreme  considera¬ 
tion  here,  and  computations  for  several  examples  with  the  asymptotic  ex- 
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pansion  (3.5)  show  that  there  is  very  little  change  in  small  sample  bias 

from  one  estimate  to  the  other.  Memory  size  is  not  much  affected.  For 

a=  0.999  we  require  0.  001  m  memory  cells  with  the  original  sample; 

1  X  m 

using  v  =  693  to  give  a*  #  1/2  and  a  reduced  sample,  we  need  —  m'  =  --  *  - 

L  C  0  73 

memory  cells.  The  slight  advantage  is  lost  when  it  is  noticed  (6,  2)  that  a 
larger  sample  size,  m,  is  required  to  achieve  equivalent  variance. 

If  the  eight  quantiles  with  a's  given  in  Table  3  are  required,  the 
unreduced  sample  ordering  (done  at  one  time  for  all  eight  quantiles)  re¬ 
quires  0.  1  m  memory  cells.  The  eight  quantiles  require  eight  separate 

,  ,  ...  .  1  m  1  m  1  m 

reduced  samples  with  memory  requirements  -  ~r  ,  —  — r  ,  • .  •  ,  rr  , 

l  0/3  b  3*1  1 

or  a  total  roughly  equivalent  to  0.  1  m. 

The  time  gain  from  sorting  smaller  samples  is  again  marginal. 

(ii)  Stochastic  approximation 

The  greatest  gain  in  using  the  maximum  (or  minimum)  scheme 
comes  from  the  reduction  of  bias  with  the  stochastic  approximation  esti¬ 
mator  (2.  2)v  There  are  other  gains;  the  maximum  operation  is  faster  than  the  , 
the  computation  (2,2),  and  changing  v  and  a'  is  very  simple.  One  could 
do  the  eight  quantiles  of  Table  3  from  eight  reduced  samples  obtained  from 
v's  of  7,  14,  28,  35,  70,  140,  350,  700.  These  values  are  close  to  the 
v's  given  in  Table  3  and  all  divide  7  00.  Thus  it  is  easy  to  compute  the 
eight  reduced  samples  simultaneously  in  nested  loops. 


Again  only  a  fixed  number  of  memory  locations  is  required. 
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A  possible  scheme  for  selecting  initial  values  for  the  stochastic 

approximation  estimator  is  the  following.  Let  S^,  .  .  .  ,  S1^!  be  the 

reduced  sample.  Take  the  first  three  values,  S^,  and  order 

them  as  S*  ,  S*  ,  S*  .  If  v  is  such  as  to  make  o' «  1/2,  use  S* 

(1)  (3) 

as  s'  |  (0),  the  initial  value,  and  since  S^j  and  estimate  the 

1  3 

—  and  —  quantiles  of  T(s),  estimate  f(s^,)  as: 


4  s,ufs<i) 


4S(3,-S,(2, 


(This  is  the  simplest  of  many  density  estimates.  ) 
The  stochastic  approximation  estimate 


V  (i)  “  vt1”1)  “7 


c  1-sgn  {s!  -  sa,(i-l)} 


(7.1) 


i  *  1,2,. .  .  , m,  (7.  t) 


uses  s'^  as  the  S* ,  i=l,  in  the  equation. 


The  value  of  C  is  not  critical  and  this  estimate  should  be  well- 
behaved.  The  jackknife  technique  (3.  8)  can  be  applied,  using  alternate 
values  for  the  sub-estimates,  although  it  is  not  known  if  the  leading 

term  in  the  bias  is  0(l/m). 

Analytical  results  for  s  .  and  the  jackknifed  estimate  s  t.  cannot 

a  a 

be  obtained  and  sampling  results  are  given  in  Tables  6,  7  and  8  for  S 
having  a  unit  exponential  distribution  and  S  having  the  distribution 
F(  s)  «  s/(  1+s) . 


Results  are  given  in  Tables  6,  7  and  8  from  an  extensive 
simulation  to  investigate  the  properties  of  quantile  estimates  based  on 
(7.  1)  and  (7.  2).  The  columns  in  these  tables  show  successively  m, 
the  sample  size;  m',  the  reduced  sample  size;  the  expected  values 
and  standard  deviations  of  the  jackknifed  estimator  based  on  (7.  1) 
and  (7.  2)  and  the  expected  values  and  standard  deviations  of  the  straight¬ 
forward  estimator  s^,  based  on  {7.  1)  and  (7  2).  The  last  column 
in  the  Tables  gives  the  asymptotic  standard  deviations  from  (6.  2).  Note 
that  v  «  7  00.  To  indicate  the  precision  obtained  in  the  sampling  ex¬ 
periment  the  quantities  in  parentheses  in  the  last  row  of  each  table  are 
estimates  (7  degrees  of  freedom)  of  the  standard  deviations  of  the  estimates. 

In  Table  6,  the  minimum  transformation  gives  estimates  of 
*0  001  *  ^01001,  the  0.  001  quantile  of  the  unit  exponential  distribu¬ 

tion.  The  jackknifed  estimator  7  i,  where  a1  «  1  -  [1-0.  001]^^,  has 
converged  by  the  time  m  reaches  roughly  30,000.  There  is  still  a 
small  bias  in  the  unjackknifed  estimator  s^( ,  approximately  one  tenth 
of  the  standard  deviation  of  the  estimator  at  m  *  28,000.  There  is  a 
penalty  paid  in  terms  of  a  larger  standard  deviation  for  the  jackknifed 
estimator  (  20%)  and  the  standard  deviation  for  IT^,  is  larger  than 

predicted  by  the  asymptotic  formula  (6.  2).  Thi  s  is  due  tc  the  added 
variability  introduced  by  the  density  estimate  (7.  1)  and  the  finite 
sample  size  m. 
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TABLE  6 


Estimation  of  0.  001  quantile  with  minimum  transformation 
and  stochastic  approximation,  with  (s^,)  and  without  (s^i) 
jackknife,  a 1  *  l-[l-0.001]v,  v«700.  Unit  exponential 
distribution. 

F{s)  ■  1  -  e"S;  F(s)  -  1  -  [1 -F(s)f;  8()  ^  -  0. 001  001 


m 

m* 

i 

E<V> 

1 

st.  dev.  (s  ,) 
a’ 

E<v> 

st.  dev.  (s'  ,) 
or 

Asymptotic 
st.  dev. 

4200 

6 

} 

*  0.  00107 

0.  00095 

0.  00111 

0.  00065 

0.  000587 

5600 

8 

0.  00106 

0.  00081 

0.  00109 

0.  00058 

0.  000509 

7000 

10 

0.  00104 

0.  00073 

0.  00108 

0.  00053 

0.  000455 

8400 

12 

0.  00103 

0.  00067 

0.  00107 

0.  00050 

0.  000415 

9800 

14 

0.  00103 

0.  00062 

0.  00106 

0.  00047 

0.  000385 

11200 

16 

0.  00102 

0.  00058 

0.  00106 

0.  00045 

0.  000360 

12600 

18 

0.  00102 

0.  00055 

0.  00105 

0.  00043 

0.  000339 

14000 

20 

0.  00102 

0.  00053 

0.  00105 

0.  00042 

0.  000322 

16800 

24 

0.  00101 

0.  00049 

0.  00104 

0.  00040 

0.  000294 

19600 

28 

0.  001008 

0.  000456 

0.  001034 

0.  000378 

0.  000272 

28000 

40 

0.  001003 

0.  000400 

0.  001025 

0.  000344 

0.  000228 

42000 

60 

0.  000999 
(0.  000001) 

0.  000346 
(0.  000001) 

0.  001016 
(0.  000001) 

0.000310 
(0.  000002) 

0.  000186 

TABLE  7 


Estimation  of  0.999  quantile  with  maximum  transforma¬ 
tion  and  stochastic  approximation,  with  (s'  t)  and  v/ithout 
(s  t)  jackknife,  a*  g  0.  999v,  v  ■  7 00.  Un?t  exponential 


a 

distribution. 


F(s)  -  1  -  e“8;  F(s)  =  [F(s)]v;  sQ  ^  «  6.908 


6.946 

6.  942 

6.931 

6.929 

6.926 

6.923 

6.920 

6.919 

6.919 

6.  916 

6.  91? 

6.  912 

(0.  001) 


st.  dev.  (s'  ,)  I  E^  .) 


0.  970 

0.  825 

0.736 

0.  670 

0.  619 

0.  581 

0.  548 

0.  520 

0.  481 

0.  448 

0.  384 

0.  332 

(0.  001) 


6.  962 

6.  957 

6.  949 

6.  946 

6.  942 

6.  940 

6.  936 

6.  934 

6.  933 

6.  929 

6.  923 

6.  920 

(0.  001) 


st.  dev.  (if  ,) 


0.  640 

0.  569 

0.  522 

0.  488 

0.  461 

0.  440 

0.  423 

0.  406 

0.  386 

0.  365 

0.  326 

0.  294 

(0.  001) 


Asymptotic 
st.  dev. 


0.  587 
0.  508 
0.  455 
0.  415 
0.  384 
0.  359 
0.  339 
0.  321 
0.  293 
0.  272 
0.  227 
0.  186 


TABLE  8 


Estimation  of  0.  999  quantile  with  maximum  transforma¬ 
tion  and  stochastic  approximation,  with  (s  ,)  and  without 
(s^,)  jackknife,  a*  »  0.  999v,  v  ■  700. 


F(s)  *  s/(l+s);  F(s)=  [s/(l+s)]v{  80  999  *  999 


m 

m* 

E<v> 

,  .  ■*>  . 

st.  dev.  (s  ,) 

a * 

E(V> 

st.  dev.  (s'  ,) 

Asymptotic 
st.  dev. 

42.00 

6 

1048 

2902 

1233 

1375 

587 

5600 

8 

1196 

17  29 

1217 

1233 

508 

7000 

10 

1076 

2416 

1197 

1195. 

455 

8400 

12 

1126 

1412 

1180 

1042 

415 

9800 

14 

1063 

1422 

1163 

872 

384 

11200 

16 

1076 

1078 

1148 

80? 

359 

12600 

18 

105? 

1082 

1137 

738 

339 

14000 

20 

1060 

1123 

1127 

753 

321 

16800 

24 

1048 

87? 

1112 

685 

293 

19600 

28 

1042 

828 

1103 

698 

27? 

28000 

40 

1025 

7  47 

1077 

624 

227 

42000 

60 

1016 

551 

1055 

494 

186 

(  1) 

(  23) 

(  1) 

(  28) 

k 
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In  Table  7  the  maximum  transformation  gives  estimates  of 
s0  999  *  6*  908,  the  0.  999  quantile  of  the  unit  exponential  distribution. 

The  bias  is  smaller  for  the  jackknifed  estimator,  and  extrapolation  of 
figures  in  column  3  gives  convergence  in  the  mean  for  most  purposes 
by  m  *  50,000.  Note,  however,  that  the  standard  deviation/bias  ratio 
is  large  ('N'100)  down  the  column.  Again,  there  is  an  inflation  in  the 
standard  deviation  of  the  jackknifed  estimator. 

The  bias  is  examined  more  closely  in  Figure  1  where  the  absolute 
value  of  the  (estimated)  bias  is  plotted  on  a  log  scale  against  m  for  both 
estimators.  The  curvature  indicates  that  higher  order  terms  than  1/m 
are  still  important  for  these  sample  sizes.  Except  for  the  m  »  42,000 
point,  the  bias  in  the  jackknifed  estimator  appears  to  be  falling  off  more 
rapidly  than  the  bias  for  s^,  (here  or’  »  0.  9997°°).  No  formal  regression 
analyses  of  these  sampling  results  have  been  done. 

Note  that  for  both  estimators  the  absolute  value  of  the  bias  (except 
for  the  last  point)  i?  less  than  the  absolute  value  of  the  bias  in  the  order 
statistic  estimator.  This  is  given  by  (3.  1)  and  plotted  as  a  sequence  of 
crosses  in  Figure  1. 

Bias  is  more  serious  for  the  extreme  case  of  the  distribution 
F(s)  ■  s/(l+s),  as  shown  in  Table  8.  The  jackknifed  estimator  is  ad¬ 
vantageous  here  where  the  standard  deviation/bias  ratio  is  smaller  than 
for  the  exponential  distribution  and  convergence  is  a  little  slower. 


BIAS 


X 


x  EXACT  BIAS  (ORDER  STATISTIC) 
o  ESTIMATED  BIAS  (sa') 

•  ESTIMATED  BIAS  (Itf) 


o 

o 

X 


10,000  30,000  50,000 

SAMPLE  SIZE,  m 


Figure  1.  Exact  bias  for  the  order  statistic  estimator  S’  t  of  the  0.999  quantile 
of  the  unit  exponential  distribution,  and  estimated  bias  for  estimators  s'  ,  and 
s'  ,  using  the  maximum  transformation  (o'  =  0.  999  ,  v  =  7  00)  and  stochastic 
approximation. 
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VIII.  CONCLUSIONS  AND  FURTHER  WORK 

The  conclusion  of  this  investigation  is  that  the  maximum  trans¬ 
formation  and  the  stochastic  approximation  scheme  (7.  2)  yields  a 
quantile  estimator  for  extreme  quantiles  (e.  g.  ,  x^  )  which  is  fast 
and  linear  in  sample  size  mj  requires  a  small,  fixed  amount  of  storage 
and  without  using  external  information  provides  a  virtually  bias  free 
estimate  even  in  extreme  cases  (such  as  the  s/(l+s)  distribution)  for 
sample  sizes  of  m  «  50,  000.  In  most  cases  smaller  samples  will  be 
satisfactory. 

There  are  possibilities,  based  on  the  properties  of  extreme 
value  distributions,  of  improving  the  quantile  estimators  performance 
even  more. 

Note  too  that  the  estimator  can  be  used  to  advantage  for  smaller 
a  than  0.  999,  and  it  is  completely  suited  for  use  with  global  variance 
reduction  techniques  such  as  the  use  of  control  variables  (Gaver,  1969) 
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